Data Integration and Pattern-Finding in Biological Sequence with TESS's Annotation Grammar and Extraction Language (AnGEL)

نویسندگان

  • Jonathan Schug
  • Max Mintz
  • Christian J. Stoeckert
چکیده

Decoding the functional elements in an organism’s genome requires the integration of a wide variety of experimental and computational data from a wide range of sources. The location of this data, viewed as sequence features in the genome, must serve as one of the essential organizing principles for this integration. It is therefore important to have a data integration system that takes advantage of this fact. As part of the TESS project, we have developed a grammar-based data integration and pattern search tool, Annotation Grammar and Extraction Language (AnGEL), that follows this principle. AnGEL can represent most of the current work in cis-regulatory module (CRM) modelling in an intuitive way and can process data extracted from a variety of sources simultaneously. Here we describe AnGEL’s capabilities and illustrate its use by querying for gene arrangements, CRMs, and protein domain structure.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

An Exploration of Teachers' Beliefs about the Role of Grammar in Iranian High Schools and Private Language Institutes

This study was an attempt to explore the beliefs of Iranian EFL teachers about the role of grammar in English language teaching in both state schools and private language institutes. Data were collected through a questionnaire developed by Burgess and Etherington (2002), which consisted of 11 main subscales and was divided into two sections. The first section dealt with approaches to grammar te...

متن کامل

Annotation and Issues in Building an English Dependency Treebank

The Paninian Grammar framework, given by Panini for his analysis of Sanskrit Language, is finding its extensive application on languages other than Sanskrit, about two thousand five hundred years after its formulation. The work presented in this paper is one such application that extends Paninian Grammar (PG or CPG: Computational Paninian Grammar) to English, a fixed word order language. It pre...

متن کامل

Finding Exact and Solo LTR-Retrotransposons in Biological Sequences Using SVM

Finding repetitive subsequences in genome is a challengeable problem in bioinformatics research area. A lot of approaches have been proposed to solve the problem, which could be divided to library base and de novo methods. The library base methods use predetermined repetitive genome’s subsequences, where library-less methods attempt to discover repetitive subsequences by analytical approach...

متن کامل

A framework for traversing dense annotation lattices

Pattern matching, or querying, over annotations is a general purpose paradigm for inspecting, navigating, mining, and transforming annotation repositories—the common representation basis for modern pipelined text processing architectures. The open-ended nature of these architectures and expressiveness of feature structure-based annotation schemes account for the natural tendency of such annotat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007